Polarity classification for Spanish tweets using the COST corpus

نویسندگان

  • Eugenio Martínez-Cámara
  • Maria Teresa Martín-Valdivia
  • Luis Alfonso Ureña López
  • Ruslan Mitkov
چکیده

It was not until 2010 when businesses, politicians and people in general began to realise the potential of Twitter in Spain. This fact has awoken research interest in the extraction of knowledge from Twitter. This paper aims to fill the gap of the lack of resources for Twitter sentiment analysis in Spanish by performing a study of different features and machine learning algorithms for classifying the polarity of Twitter posts. The result is a new corpus of Spanish tweets called COST, and we have carried out a wide-ranging experimentation in which different machine learning algorithms have been used. Furthermore, we have tested the influence of using different weighting schemes for unigrams, the influence of eliminating stop-words and the application of a stemmer process.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzy Sentiment Analysis Using Spanish Tweets

Opinion mining and sentiment analysis are two topics with growing interest in artificial intelligence. Last years' research on these areas has evolved in increasing complexity and sophistication with tasks such as opinion retrieval, sentiment extraction, classification and summarization being carried out using text mining and natural language processing techniques. Possible strategies go from c...

متن کامل

DeustoTech Internet at TASS 2015: Sentiment Analysis and Polarity Classification in Spanish Tweets

This paper describes TASS 2105, the fourth edition of the Workshop on Sentiment Analysis at SEPLN. The main objective is to promote the research and the development of new algorithms, resources and techniques in the field of sentiment analysis in social media (specifically Twitter), focused on Spanish language. This paper presents the TASS 2015 proposed tasks, the contents of the generated corp...

متن کامل

Classification Of Spanish Election Tweets (COSET) 2017 : Classifying Tweets Using Character and Word Level Features

This paper describes the International Institute of Information Technology of Hyderabad’s submission to the task Classification Of Spanish Election Tweets (COSET) as a part of IBEREVAL-2017[1]. The task is to classify Spanish election tweets into political, policy, personal, campaign and other issues. Our system uses Support Vector Machines with radial basis function kernel to classify tweets. ...

متن کامل

TASS: A Naive-Bayes strategy for sentiment analysis on Spanish tweets∗ TASS: Una estrategia Naive-Bayes para el análisis del sentimiento en tweets en español

This article describes the strategy underlying the system presented by our team for the sentiment analysis task at TASS 2013. The system is mainly based on a naive-bayes classifier for detecting the polarity of Spanish tweets. The experiments have shown that the best performance is achieved by using a binary classifier distinguishing between just two sharp polarity categories: positive and nega...

متن کامل

Short Text Classification Using Deep Representation: A Case Study of Spanish Tweets in Coset Shared Task

Topic identification as a specific case of text classification is one of the primary steps toward knowledge extraction from the raw textual data. In such tasks, words are dealt with as a set of features. Due to high dimensionality and sparseness of feature vector result from traditional feature selection methods, most of the proposed text classification methods for this purpose lack performance...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Information Science

دوره 41  شماره 

صفحات  -

تاریخ انتشار 2015